35 research outputs found

    Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states

    Get PDF
    Reinforcement Learning is carried out on-line, through trial-and-error interactions of the agent with the environment, which can be very time consuming when considering robots. In this paper we contribute a new learning algorithm, CFQ-Learning, which uses macro-states, a low-resolution discretisation of the state space, and a partial-policy to get around obstacles, both of them based on the complexity of the environment structure. The use of macro-states avoids convergence of algorithms, but can accelerate the learning process. In the other hand, partial-policies can guarantee that an agent fulfils its task, even through macro-state. Experiments show that the CFQ-Learning performs a good balance between policy quality and learning rate.Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)GRICESFAPESPCNP

    General detection model in cooperative multirobot localization

    Get PDF
    The cooperative multirobot localization problem consists in localizing each robot in a group within the same environment, when robots share information in order to improve localization accuracy. It can be achieved when a robot detects and identifies another one, and measures their relative distance. At this moment, both robots can use detection information to update their own poses beliefs. However some other useful information besides single detection between a pair of robots can be used to update robots poses beliefs as: propagation of a single detection for non participants robots, absence of detections and detection involving more than a pair of robots. A general detection model is proposed in order to aggregate all detection information, addressing the problem of updating poses beliefs in all situations depicted. Experimental results in simulated environment with groups of robots show that the proposed model improves localization accuracy when compared to conventional single detection multirobot localization.FAPESPCNP

    Markov decision processes for ad network optimization

    Get PDF
    In this paper we examine a central problem in a particular advertising\ud scheme: we are concerned with matching marketing campaigns that produce\ud advertisements (“ads”), to impressions — where “impression” is a general term\ud for any space in the internet that can display an ad. In this paper we propose a\ud new take on the problem by resorting to planning techniques based on Markov\ud Decision Processes, and by resorting to plan generation techniques that have\ud been developed in the AI literature. We present a detailed formulation of the\ud Markov Decision Process approach and results of simulated experimentsAnna Helena Reali Costa and F ́ abio Gagliardi Cozman are partially supported by CNPq. Fl ́ avio Sales Truzzi is supported by CAPES. The work reported here has received sub- stantial support through FAPESP grant 2008/03995-5 and FAPESP grant 2011/19280-

    Reinforcement Learning Applied to Trading Systems: A Survey

    Full text link
    Financial domain tasks, such as trading in market exchanges, are challenging and have long attracted researchers. The recent achievements and the consequent notoriety of Reinforcement Learning (RL) have also increased its adoption in trading tasks. RL uses a framework with well-established formal concepts, which raises its attractiveness in learning profitable trading strategies. However, RL use without due attention in the financial area can prevent new researchers from following standards or failing to adopt relevant conceptual guidelines. In this work, we embrace the seminal RL technical fundamentals, concepts, and recommendations to perform a unified, theoretically-grounded examination and comparison of previous research that could serve as a structuring guide for the field of study. A selection of twenty-nine articles was reviewed under our classification that considers RL's most common formulations and design patterns from a large volume of available studies. This classification allowed for precise inspection of the most relevant aspects regarding data input, preprocessing, state and action composition, adopted RL techniques, evaluation setups, and overall results. Our analysis approach organized around fundamental RL concepts allowed for a clear identification of current system design best practices, gaps that require further investigation, and promising research opportunities. Finally, this review attempts to promote the development of this field of study by facilitating researchers' commitment to standards adherence and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page

    Realidade Virtual: Estereoscopia na Educação

    Get PDF
    Realidade virtual (RV) na educação é um tema fortemente presente nas instituições de pesquisas de vários países. Este artigo discute a aplicação de técnicas de RV, incluindo o uso da computação gráfi ca e a produção de vídeos tridimensionais a partir de equipamentos específi cos, porém de baixo custo para instituições de ensino. A estereoscopia atua como ponto chave para a visualização dessas aplicações. Para o desenvolvimento do projeto, são utilizados uma lente 3D, câmera doméstica, projetores de baixo custo, fi ltros de luz polarizados e óculos 3D passivo. O objetivo da produção do vídeo 3D foi o de avaliar desde os processos envolvidos na elaboração de roteiro, gravação e exibição, até os custos necessários para que uma instituição de ensino adote recursos de realidade virtual para o aprimoramento da aprendizagem

    Speeding-up reinforcement learning through abstraction and transfer learning

    Get PDF
    We are interested in the following general question: is it pos-\ud sible to abstract knowledge that is generated while learning\ud the solution of a problem, so that this abstraction can ac-\ud celerate the learning process? Moreover, is it possible to\ud transfer and reuse the acquired abstract knowledge to ac-\ud celerate the learning process for future similar tasks? We\ud propose a framework for conducting simultaneously two lev-\ud els of reinforcement learning, where an abstract policy is\ud learned while learning of a concrete policy for the problem,\ud such that both policies are refined through exploration and\ud interaction of the agent with the environment. We explore\ud abstraction both to accelerate the learning process for an op-\ud timal concrete policy for the current problem, and to allow\ud the application of the generated abstract policy in learning\ud solutions for new problems. We report experiments in a\ud robot navigation environment that show our framework to\ud be effective in speeding up policy construction for practical\ud problems and in generating abstractions that can be used to\ud accelerate learning in new similar problems.This research was partially supported by FAPESP (2011/ 19280-8, 2012/02190-9, 2012/19627-0) and CNPq (311058/ 2011-6, 305395/2010-6

    DEBACER: a method for slicing moderated debates

    Get PDF
    Subjects change frequently in moderated debates with several participants, such as in parliamentary sessions, electoral debates, and trials. Partitioning a debate into blocks with the same subject is essential for understanding. Often a moderator is responsible for defining when a new block begins so that the task of automatically partitioning a moderated debate can focus solely on the moderator's behavior. In this paper, we (i) propose a new algorithm, DEBACER, which partitions moderated debates; (ii) carry out a comparative study between conventional and BERTimbau pipelines; and (iii) validate DEBACER applying it to the minutes of the Assembly of the Republic of Portugal. Our results show the effectiveness of DEBACER.info:eu-repo/semantics/publishedVersio

    AVALIAÇÃO DE POLÍTICAS ABSTRATAS NA TRANSFERÊNCIA DE CONHECIMENTO EM NAVEGAÇÃO ROBÓTICA

    Get PDF
    This paper presents a new approach to the problem of solving a new task by the use of previous knowledge acquired during the process of solving a similar task in the same domain, robot navigation. A new algorithm, Qab-Learning, is proposed to obtain the abstract policy that will guide the agent in the task of reaching a goal location from any other location in the environment, and this policy is compared to the policy derived from another  algorithm, ND-TILDE. The policies are applied in a number of different tasks in two environments. The  results show that the policies, even after the process of abstraction, present a positive impact on the performance of the agent. COMPARISON AND EVALUATION OF ABSTRACT POLICIES FOR TRANSFER LEARNING IN ROBOT NAVIGATION TASKSEste paper apresenta uma análise comparativa do desempenho de políticas abstratas na reutilização de conhecimento no domínio de navegação robótica. O algoritmo  Qab-Learning é proposto para a obtenção dessas políticas, que são então comparadas a poíticas abstratas geradas por um algoritmo presente na literatura, o ND-TILDE, na solução de diferentes tarefas de navegação robótica. O presente trabalho demonstra que essas políticas, mesmo após o processo de abstração, apresentam um impacto positivo no desempenho do agente
    corecore